Detecting Plagiarism in Text Documents through Grammar-Analysis of Authors

نویسندگان

  • Michael Tschuggnall
  • Günther Specht
چکیده

The task of intrinsic plagiarism detection is to find plagiarized sections within text documents without using a reference corpus. In this paper, the intrinsic detection approach Plag-Inn is presented which is based on the assumption that authors use a recognizable and distinguishable grammar to construct sentences. The main idea is to analyze the grammar of text documents and to find irregularities within the syntax of sentences, regardless of the usage of concrete words. If suspicious sentences are found by computing the pq-gram distance of grammar trees and by utilizing a Gaussian normal distribution, the algorithm tries to select and combine those sentences into potentially plagiarized sections. The parameters and thresholds needed by the algorithm are optimized by using genetic algorithms. Finally, the approach is evaluated against a large test corpus consisting of English documents, showing promising results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...

متن کامل

English-Persian Plagiarism Detection based on a Semantic Approach

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

Automated Plagiarism Detection System for Malayalam Text Documents

In this paper, a plagiarism detection tool for plagiarism detection in Malayalam documents is presented. Many language-sensitive tools for detecting plagiarism in natural language documents have been developed, particularly for English. Detecting plagiarism in Malayalam documents is particularly a challenging task because of the complex linguistic structure of Malayalam. The plagiarism detectio...

متن کامل

Automated Plagiarism Detection System for Malayalam Text Documents

In this paper, a plagiarism detection tool for plagiarism detection in Malayalam documents is presented. Many language-sensitive tools for detecting plagiarism in natural language documents have been developed, particularly for English. Detecting plagiarism in Malayalam documents is particularly a challenging task because of the complex linguistic structure of Malayalam. The plagiarism detectio...

متن کامل

Automated Plagiarism Detection System for Malayalam Text Documents

In this paper, a plagiarism detection tool for plagiarism detection in Malayalam documents is presented. Many language-sensitive tools for detecting plagiarism in natural language documents have been developed, particularly for English. Detecting plagiarism in Malayalam documents is particularly a challenging task because of the complex linguistic structure of Malayalam. The plagiarism detectio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013